16 research outputs found
A Result for Orthogonal Plus Rank-1 Matrices
In this paper the sum of an orthogonal matrix and an outer product is
studied, and a relation between the norms of the vectors forming the outer
product and the singular values of the resulting matrix is presented. The main
result may be found in Theorem 1
Homography-Based Positioning and Planar Motion Recovery
Planar motion is an important and frequently occurring situation in mobile robotics applications. This thesis concerns estimation of ego-motion and pose of a single downwards oriented camera under the assumptions of planar motion and known internal camera parameters. The so called essential matrix (or its uncalibrated counterpart, the fundamental matrix) is frequently used in computer vision applications to compute a reconstruction in 3D of the camera locations and the observed scene. However, if the observed points are expected to lie on a plane - e.g. the ground plane - this makes the determination of these matrices an ill-posed problem. Instead, methods based on homographies are better suited to this situation.One section of this thesis is concerned with the extraction of the camera pose and ego-motion from such homographies. We present both a direct SVD-based method and an iterative method, which both solve this problem. The iterative method is extended to allow simultaneous determination of the camera tilt from several homographies obeying the same planar motion model. This extension improves the robustness of the original method, and it provides consistent tilt estimates for the frames that are used for the estimation. The methods are evaluated using experiments on both real and synthetic data.Another part of the thesis deals with the problem of computing the homographies from point correspondences. By using conventional homography estimation methods for this, the resulting homography is of a too general class and is not guaranteed to be compatible with the planar motion assumption. For this reason, we enforce the planar motion model at the homography estimation stage with the help of a new homography solver using a number of polynomial constraints on the entries of the homography matrix. In addition to giving a homography of the right type, this method uses only \num{2.5} point correspondences instead of the conventional four, which is good \eg{} when used in a RANSAC framework for outlier removal
Embed Me If You Can: A Geometric Perceptron
Solving geometric tasks involving point clouds by using machine learning is a
challenging problem. Standard feed-forward neural networks combine linear or,
if the bias parameter is included, affine layers and activation functions.
Their geometric modeling is limited, which motivated the prior work introducing
the multilayer hypersphere perceptron (MLHP). Its constituent part, i.e.,
hypersphere neuron, is obtained by applying a conformal embedding of Euclidean
space. By virtue of Clifford algebra, it can be implemented as the Cartesian
dot product of inputs and weights. If the embedding is applied in a manner
consistent with the dimensionality of the input space geometry, the decision
surfaces of the model units become combinations of hyperspheres and make the
decision-making process geometrically interpretable for humans. Our extension
of the MLHP model, the multilayer geometric perceptron (MLGP), and its
respective layer units, i.e., geometric neurons, are consistent with the 3D
geometry and provide a geometric handle of the learned coefficients. In
particular, the geometric neuron activations are isometric in 3D. When
classifying the 3D Tetris shapes, we quantitatively show that our model
requires no activation function in the hidden layers other than the embedding
to outperform the vanilla multilayer perceptron. In the presence of noise in
the data, our model is also superior to the MLHP
TetraSphere: A Neural Descriptor for O(3)-Invariant Point Cloud Analysis
Rotation invariance is an important requirement for the analysis of 3D point
clouds. In this paper, we present a learnable descriptor for rotation- and
reflection-invariant 3D point cloud analysis based on recently introduced
steerable 3D spherical neurons and vector neurons. Specifically, we show the
compatibility of the two approaches and apply steerable neurons in an
end-to-end method, which both constitute the technical novelty. In our
approach, we perform TetraTransform -- which lifts the 3D input to an
equivariant 4D representation, constructed by the steerable neurons -- and
extract deeper rotation-equivariant features using vector neurons. This
integration of the TetraTransform into the VN-DGCNN framework, termed
TetraSphere, inexpensively increases the number of parameters by less than
0.0007%. Taking only points as input, TetraSphere sets a new state-of-the-art
performance classifying randomly rotated real-world object scans of the hardest
subset of ScanObjectNN, even when trained on data without additional rotation
augmentation. Additionally, TetraSphere demonstrates the second-best
performance segmenting parts of the synthetic ShapeNet, consistently
outperforming the baseline VN-DGCNN. All in all, our results reveal the
practical value of steerable 3D spherical neurons for learning in 3D Euclidean
space
DeDoDe: Detect, Don't Describe -- Describe, Don't Detect for Local Feature Matching
Keypoint detection is a pivotal step in 3D reconstruction, whereby sets of
(up to) K points are detected in each view of a scene. Crucially, the detected
points need to be consistent between views, i.e., correspond to the same 3D
point in the scene. One of the main challenges with keypoint detection is the
formulation of the learning objective. Previous learning-based methods
typically jointly learn descriptors with keypoints, and treat the keypoint
detection as a binary classification task on mutual nearest neighbours.
However, basing keypoint detection on descriptor nearest neighbours is a proxy
task, which is not guaranteed to produce 3D-consistent keypoints. Furthermore,
this ties the keypoints to a specific descriptor, complicating downstream
usage. In this work, we instead learn keypoints directly from 3D consistency.
To this end, we train the detector to detect tracks from large-scale SfM. As
these points are often overly sparse, we derive a semi-supervised two-view
detection objective to expand this set to a desired number of detections. To
train a descriptor, we maximize the mutual nearest neighbour objective over the
keypoints with a separate network. Results show that our approach, DeDoDe,
achieves significant gains on multiple geometry benchmarks. Code is provided at
https://github.com/Parskatt/DeDoDe
DKM: Dense Kernelized Feature Matching for Geometry Estimation
Feature matching is a challenging computer vision task that involves finding
correspondences between two images of a 3D scene. In this paper we consider the
dense approach instead of the more common sparse paradigm, thus striving to
find all correspondences. Perhaps counter-intuitively, dense methods have
previously shown inferior performance to their sparse and semi-sparse
counterparts for estimation of two-view geometry. This changes with our novel
dense method, which outperforms both dense and sparse methods on geometry
estimation. The novelty is threefold: First, we propose a kernel regression
global matcher. Secondly, we propose warp refinement through stacked feature
maps and depthwise convolution kernels. Thirdly, we propose learning dense
confidence through consistent depth and a balanced sampling approach for dense
confidence maps. Through extensive experiments we confirm that our proposed
dense method, \textbf{D}ense \textbf{K}ernelized Feature \textbf{M}atching,
sets a new state-of-the-art on multiple geometry estimation benchmarks. In
particular, we achieve an improvement on MegaDepth-1500 of +4.9 and +8.9
AUC compared to the best previous sparse method and dense method
respectively. Our code is provided at https://github.com/Parskatt/dk
RoMa: Revisiting Robust Losses for Dense Feature Matching
Dense feature matching is an important computer vision task that involves
estimating all correspondences between two images of a 3D scene. In this paper,
we revisit robust losses for matching from a Markov chain perspective, yielding
theoretical insights and large gains in performance. We begin by constructing a
unifying formulation of matching as a Markov chain, based on which we identify
two key stages which we argue should be decoupled for matching. The first is
the coarse stage, where the estimated result needs to be globally consistent.
The second is the refinement stage, where the model needs precise localization
capabilities. Inspired by the insight that these stages concern distinct
issues, we propose a coarse matcher following the regression-by-classification
paradigm that provides excellent globally consistent, albeit not exactly
localized, matches. This is followed by a local feature refinement stage using
well-motivated robust regression losses, yielding extremely precise matches.
Our proposed approach, which we call RoMa, achieves significant improvements
compared to the state-of-the-art. Code is available at
https://github.com/Parskatt/RoM
Trust Your IMU: Consequences of Ignoring the IMU Drift
In this paper, we argue that modern pre-integration methods for inertial
measurement units (IMUs) are accurate enough to ignore the drift for short time
intervals. This allows us to consider a simplified camera model, which in turn
admits further intrinsic calibration. We develop the first-ever solver to
jointly solve the relative pose problem with unknown and equal focal length and
radial distortion profile while utilizing the IMU data. Furthermore, we show
significant speed-up compared to state-of-the-art algorithms, with small or
negligible loss in accuracy for partially calibrated setups. The proposed
algorithms are tested on both synthetic and real data, where the latter is
focused on navigation using unmanned aerial vehicles (UAVs). We evaluate the
proposed solvers on different commercially available low-cost UAVs, and
demonstrate that the novel assumption on IMU drift is feasible in real-life
applications. The extended intrinsic auto-calibration enables us to use
distorted input images, making tedious calibration processes obsolete, compared
to current state-of-the-art methods
Planar Motion and Visual Odometry: Pose Estimation from Homographies
This thesis concerns ego-motion and pose estimation of a single camera under the assumptions of planar motion and constant internal camera parameters. Planar motion is common for cameras mounted onto mobile robots, particularly in indoor scenarios, as they remain at a constant height above the ground plane. In Paper A, a parametrisation of the camera motion and pose is presented, along with an iterative approach for determining the parameters. Paper B describes how to extend the method in Paper A to use more than one homography at a time in the estimation process, thereby improving the estimation accuracy and robustness. Paper C presents an alternative method for estimating the distance between camera positions that is independent of the estimated orientation of the cameras
Ego-Motion Recovery and Robust Tilt Estimation for Planar Motion Using Several Homographies
In this paper we suggest an improvement to a recent algorithm for estimating the pose and ego-motion of a camera which is constrained to planar motion at a constant height above the floor, with a constant tilt. Such motion is common in robotics applications where a camera is mounted onto a mobile platform and directed towards the floor. Due to the planar nature of the scene, images taken with such a camera will be related by a planar homography, which may be used to extract the ego-motion and camera pose. Earlier algorithms for this particular kind of motion were not concerned with determining the tilt of the camera, focusing instead on recovering only the motion. Estimating the tilt is a necessary step in order to create a rectified map for a SLAM system. Our contribution extends the aforementioned recent method, and we demonstrate that our enhanced algorithm gives more accurate estimates of the motion parameters